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Abstract. Studying event time series is a powerful approach for an¬ 
alyzing the dynamics of complex dynamical systems in many fields 
of science. In this paper, we describe the method of event coincidence 
analysis to provide a framework for quantifying the strength, direction¬ 
ality and time lag of statistical interrelationships between event series. 
Event coincidence analysis allows to formulate and test null hypotheses 
on the origin of the observed interrelationships including tests based on 
Poisson processes or, more generally, stochastic point processes with a 
prescribed inter-event time distribution and other higher-order proper¬ 
ties. Applying the framework to country-level observational data yields 
evidence that flood events have acted as triggers of epidemic outbreaks 
globally since the 1950s. Facing projected future changes in the statis¬ 
tics of climatic extreme events, statistical techniques such as event coin¬ 
cidence analysis will be relevant for investigating the impacts of anthro¬ 
pogenic climate change on human societies and ecosystems worldwide. 
(Date: April 7, 2016) 


1 Introduction 

Climate extremes and related natural disasters are of major interest for research on 
climate change and its impacts, because their frequency and amplitude is projected 
to increase significantly in the future mm- However, when it comes to the quantifi¬ 
cation of impacts of associated natural disasters on ecosystems H] and society (e.g. in 
terms of triggering epidemics or social unrest 0), there are only very few studies pro¬ 
viding a systematic assessment beyond individual cases. In-depth studies in this field 
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require tailored statistical analysis tools that allow for a quantitative characterization 
of statistical interdependencies between event time series and are also applicable to 
series comprising only a few events. 


Time series of events or event series, here defined as an ordered set of N event 
timings {ti,..., Eire the subject of study in many fields of science. In this paper, 
such event series are considered as binary, i.e. amplitudes associated to the event 
timings U are either not available or are not taken into account in the analysis (cor¬ 
responding to a description as unmarked point processes). There are many real-world 
examples of event series of this type, including photon arrival times in physics [S], 
neuronal spikes in neurosciences m, exchange of messages on communication net¬ 
works in social science [S] or timings of climatic extreme events m and armed civil 
conflicts |10 | ll | 12i in climate impact studies [13]. Many recent studies have focused 
on investigating statistical properties of single event series or point processes such as 
inter-event time distributions. For example, the analysis of human online communi¬ 
cation reveals that waiting times between text messages do not follow an exponential 
distribution as expected from an uncorrelated (Poissonian) random process, but in¬ 
clude bursts of frequent events interrupted by long periods of inactivity that can be 
better described by power-law distributions |9]. 


However, less work appears to be available in the literature on quantifying and 
systematically studying statistical interrelationships between two or more event se¬ 
ries, particularly when compared to the wide range of methods of this type available 
for standard time series such as Pearson correlation mutual information |15j or 
synchronization measures m- Particularly in neuroscience, techniques have been de¬ 
veloped for measuring the similarity or synchrony of event series of neuronal spike 
trains |7l8ll7ll8ll9j . In climatology, measures of event synchronization have been 
recently applied to study statistical interrelationships between extreme precipitation 
events and their complex spatial structure [ID] using climate network approaches [H] . 
More specifically, this approach has been used to unravel the complex spatio-temporal 
patterns of heavy rainfall events in the Indian monsoon domain [20l22j , derive predic¬ 
tors for extreme flood events in South America [53] and study regional climatological 
phenomena related to extreme precipitation over Central Europe |24] . 


Measures of event synchronization tend to be used mostly in an explorative mode 
of research aiming to reveal associations in large data sets of event series from neu¬ 
roscience or climatology. However, some of the currently most debated problems in 
climate impact research, e.g. concerning climate-related variables such as extreme 
temperatures or the El Niho-Southern Oscillation as potential drivers of armed civil 
conflicts mm, call for a more in-depth analysis of statistical interrelations between 
event series. Extending upon previously applied event synchronization approaches, in 
this paper we formally put forward the alternative framework of event coincidence 
analysis |25j for investigating in detail the statistical interrelationships between pairs 
of event time series and testing hypotheses on the nature of these interrelationships. 
Event coincidence analysis is designed to measure the strength, directionality and 
time lag of statistical relations between event series. The method was introduced in a 
less general setting to study possible statistical interrelationships between nonlinear 
regime shifts in African paleoclimate during the past 5 million years and events in 
hominin evolution such as the appearance and disappearance of species |25j . It has 
also been applied to investigate the impacts of climatic extremes such as droughts 
and heat waves on vegetation productivity based on observational data and dynamic 
vegetation model runs [4|. Furthermore, event coincidence analysis has been used to 
evaluate different hypotheses on socio-economic factors influencing the vulnerability 
of countries to natural disasters with a focus on the possible triggering of outbreaks 
of civil conflicts |S|. 
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Fig. 1. Schematic illustration of event coincidence analysis for quantifying statistical inter¬ 
relationships between two event time series A and B for the case of precursor coincidences. 
The assumption to be quantihed and tested for is that events in B are precursors of events 
in A (under the condition that an A-event has occurred). Focusing on an event in series 
A (dark red bar), a (lagged) coincidence occurs with events in series B (dark blue bars) if 
the latter fall into the coincidence interval of width AT (grey bar) that can be shifted by 
a lag parameter r. Coincidence rates are obtained by computing the relative frequency of 
occurrence of such coincidences for all events in series A (Sect.j^. 


We argue that event coincidence analysis is a particularly useful tool in the area of 
climate impact studies, since it allows to statistically study the effects of such extreme 
events on other processes and explicitly takes their nature as event series into account. 
To illustrate the capabilities of our approach, we employ event coincidence analysis 
to assess extreme flood events as possible drivers of epidemics extending upon earlier 
work m- Applying the framework in this case study based on observational data 
yields evidence that, from a globally aggregated perspective, flood events have acted 
as drivers of epidemics in the same country in the past. 

The structure of this paper is as follows: event coincidence analysis is thoroughly 
introduced in Sect, [^including descriptions of the basic methodology, statistical null 
models for testing hypotheses and related approaches. Subsequently, the results of 
applying event coincidence analysis to event series of extreme floods and epidemics 
are reported in Sect. Finally, Sect. provides conclusions and perspectives for 
promising future extensions of the event coincidence analysis methodology. 


2 Methods 

In this section, we develop the method of event coincidence analysis that is concerned 
with quantifying the statistical interrelationships between pairs of event series, ex¬ 
tending upon the approach introduced in |25j . A pair of event time series A and B is 
here defined as two ordered event sets with timings {tf ,and {tf,..., t^^} 
with numbers of events Na, Nb, respectively. Both event series are assumed to cover 
a time interval {tQ,tf) of length T = tf — to, such that to < t^ <••■ < t^^ < tf and 
to ^ ^ t%g <tf. This yields event rates = Na/T and Xb = Nb/T. 

Event coincidence analysis is based on counting coincidences between events of 
different types. In the following, the assumption to be quantified and tested for is 
that events in B precede events in A, which is related to a possible causal influence 
from B- to A-type events (Fig. [^. The opposite case of assuming that events in 
A precede events in B can be accommodated by exchanging the labels A and B 
throughout the formulae and text. 

An instantaneous coincidence is defined to occur if two events at tf,t^ with 
t^ < tf are closer in time than a temporal tolerance or coincidence interval AT, i.e. 
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if 


— t 


f < AT 


(1) 

holds. In turn, a lagged coincidence is defined as an instantaneous coincidence between 
the time shifted event at tf — r, where r > 0 is a time lag parameter, and the event 


at <tf - T, 


i.e. if the condition 


{tf - r) - tf < AT 


( 2 ) 


is satisfied. 

Differing from the above problem formulation that is consistently used throughout 
this work, we note that event coincidence analysis can also be performed by employing 
coincidence intervals that are symmetric around A-events and relaxing the assumption 
that B-events must precede yl-events. The resulting condition \tf — t^\ < AT can 
be meaningful, e.g. given event series with pronounced dating uncertainties as in the 
case of archeological, paleontological and paleoenvironmental data |25j . 

In the following, we int rodu ce the concept of the coincidence rate between a single 
pair of event series (Sect. 2.1) as well as an aggre gated co incid ence rate for taking 
into account several pairs of event series (Sect. 2.2). Section 2.3 discusses coincidence 


statistics for null models of stochastic point processes that can be used to test the 
statistical significance of coincidence rates estimated from data. Moreover, we put 
event coincidence analysis into the context of other related approaches for the analysis 
of event time series (Sect. [0| . 

Many of the measures and significance tests described below are implemented in 
the open source software package CoinCalc |26j written in the programming language 
R which is available at https://github.eom/JonatanSiegmund/CoinCalc. 


2.1 Coincidence rates for a pair of event series 


For quantifying the strength of statistical interrelationships between two event time 
series A and B, we introduce two variants of coincidence rates addressing R-type 
events as precursors and triggers of A-type events, respectively. In the first case, the 
precursor coincidence rate 


rp{AT,T) 


, Na 


Nb 


X! Mo,AT] {Mt 

i = l 



( 3 ) 


measures the fraction of A-type events that are preceded by at least one R-type event 
(note that multiple R-type events within the coincidence interval are counted only 
once, see also Fig. [^. Here, 0(-) denotes the Heaviside function (here defined as 
0{x) = 0 for X < 0 and 0{x) = 1 otherwise) and !/(•) the indicator function of the 
interval / (defined as l/(x) = 1 for x G / and l/(x) = 0 otherwise). In the second 
case, the trigger coincidence rate 


rt{AT,T) 


Nb 




1 = 1 


Na 


X! ^[O.AT] {Mi 



( 4 ) 


measures the fraction of H-type events that are followed by at least one H-type 
event (note that multiple H-type events within the coincidence interval are counted 
only once). Distinguishing between precursor and trigger coincidence rates allows 
to introduce a certain notion of directionality to the method of event coincidence 
analysis. Furthermore, the parameter r allows to explicitly take into account lagged 
relationships between event series. 
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2.2 Aggregated coincidence rates 


In some applications, it can be relevant to have at hand an integrated measure for 
coincidences that occur between several pairs of event series as in the case of events 
that are available on a spatial grid or for different regions or countries. For example, 
consider multiple country-wise sets of A-type events (floods) and B-type events (epi¬ 
demic outbreaks) as in the application presented in Sect.|^ In this case, coincidences 
can only be meaningfully counted on a per-country basis, but it is desirable to quan¬ 
tify the aggregated coincidence rate over all countries in the data set or a suitably 
filtered subset of countries to obtain a global measure of the strength of the rela¬ 
tionship between the two event types considered and its statistical significance with 
respect to different null hypotheses [5]. Another motivation for considering aggregate 
measures of coincidence relationships is related to data quality. In some applications 
with small event numbers Ng, only aggregation over several pairs of event series 
allows to draw robust statistical conclusions. 

Analogously to the case of a single pair of event series, two flavors of aggregated 
coincidence rates are defined as follows given a set G of pairs of A- and S-type events. 
The aggregated precursor coincidence rate 


r^iAT,T) 






0 


-I f/iA,k \ ,B,k\ 

T,j=i l[0.ziT] (^(^i -T)-t^' 'j 

^A,k 


(5) 


measures the total number of precursor coincidences occurring in all pairs of event 
series normalized by the maximum possible number of such coincidences. Along the 
same lines, the aggregated trigger coincidence rate 


rf(Z\r,r) 


E 


kGG 




0 


-I f/iA,k \ ,B,k\ 

Ei=f l[o.zir] (^(^i - r) - J 

E/cgG ^B,k 


( 6 ) 


is the accordingly normalized total number of trigger coincidences occurring in all 
pairs of event series in G. Note that for both types of aggregated coincidence rates, 
multiple events falling within the coincidence window are counted only once, as in 
the definition of coincidence rates for a single pair of event series (Sect. 2.1). 

Studying aggregated coincidence rates can be seen as a first step towards a sys¬ 
tematic analysis of coincidences in more general spatio-temporal event data. Such 
data can be conceptionalized as being generated by spatial or spatio-temporal point 
processes P705] . More generally, events of interest for an extended event coincidence 
analysis can also take the form of higher dimensional objects with a nontrivial shape 
in terms of, e.g. latitude, longitude and time, such as the spatio-temporal extremes 
in the fraction of absorbed photosynthetically active radiation (fAPAR) identified by 
Zscheischler et al. 1291. 


2.3 Statistics for null models of stochastic point processes 


Treating stochastic point processes as generators of event time series allows to de¬ 
rive distributions of coincidence rates to test the statistical significance of the results 
of event coincidence analysis based on a hierarchy of null hypotheses, analogously 
to classical statistics and the method of surrogates for standard time series analy¬ 
sis [ms]. H ere, we focus on Poisson processes without temporal correlations between 
events (Sect. 2.3.1[ ) and point processes with a prescribed inter-event time distribu¬ 
tion P{A.t) that allow to consider, e.g. processes with heavy-tailed P{At) that tend 
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to produce bursts of events (Sect. 2.3.21. More generally, classes of null models for 
point processes of interest include, for example, event series with higher-order mem¬ 
ory effects such as correlations between event bursts m that are, however, beyond 
the scope of this paper. We also briefly touch upon th e pos sibility of constructing 
surrogate event series from time series surrogates (Sect. 2.3.31. 

Analytical results are given below where available, otherwise we rely on Monte 
Carlo simulations. For illustration, we restrict ourselves to a single pair of event time 
series, the case of sets of event series can be treated analogously. Signihcance tests 
based on the null hypothesis of Poisson processes following a Monte Carlo approach 
are applied in Sect.j^to quantify the statistical interrelationships between flood events 
and epidemic outbreaks. 


2.3.1 Poisson processes 


Here, we assume that both A- and B-type events are generated by Poisson processes 
with event rates Ayi and Xb, respectively. This assumption implies that both types 
of events are distributed randomly, independently and uniformly over the continuous 
time interval of length T. Since our focus is on using the derived statistics for hy¬ 
pothesis testing on data sets with typically small numbers of events in each series, we 
assume fixed event numbers Na = XaT and Nb = A^T. Note that the analytically 
derived estimators presented below are only expected to yield reliable results in the 
limit of sufficiently large event numbers 

Na > 1 and Nb » 1. (7) 


First, we analytically derive the statistics of precursor coincidence rates extending 
upon [25] . The probability for a (lagged) precursor coincidence between an A-event 
and a preceding H-event is given by the probability 


P = 


AT 


T-t 


( 8 ) 


that a B-event occurs randomly in a segment of length AT of the effective time span 
of interest T—t. This follows from the null hypothesis of Poisson processes generating 
the event series, where the probability for events to occur is the same in any time 
instant and is independent from the occurrence of other events, resulting in a linear 
dependence of p on AT. 

Then the probability of a specific A-event to coincide with at least one of the Nb 
H-events is given by 


f - (f = 1 - 



AT 

T-t) 


(9) 


Note that when counting only exactly contemporaneous coincidences with AT = 0, 
p = 0 follows in the limit of a continuous time axis. However, in real-world data sets, 
the time axis is often discrete, e.g. due to finite sampling intervals or finite numerical 
precision. In this common case, p = 1/(T — r) needs to be used in the following with 
T and r measured in numbers of time steps instead of units of absolute time when 
the interest is in coincidences with zero tolerance |32| . 

Based on this expression, we can calculate the probability P{K;Na, 1— (1—p)^^) 
that exactly K precursor coincidences are observed for a given realization of the two 
Poisson processes. Even though A-events are assumed to be distributed independently 
in the interval [0,T], to proceed with the derivation we need to further assume that 
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A-type events are typically spaced much more widely than the coincidence interval 
AT, i.e. 


AT < T/Na. (10) 

When this condition (Eq. [Io| ) is fulfilled, the events that a specific A^-event coincides 
with at least one B-event and that another specific Aj-event coincides with at least 
one B-event can be considered statistically independent. Only then, P{K; Na, 1 —(1 — 
p)^^) is given by the binomial distribution with Na trials and a success probability 
1 — (1 — [33 and, hence, 


P{K;NA,l-{l-p)^n = 


[Na 

[k 


1 - 1 - 


AT 

T-t 


Nb 


K 


1 - 


AT 

T-t 


Na 


Na-K 


( 11 ) 


Using the relationship K = VpNA to substitute K by Vp in the above equation yields 
the distribution of precursor coincidence rates P{rp;NA, 1 — (1 — p)^^)- 

From the distribution (Eq.f^, the expectation value {K) and standard deviation 
cr{K) can be straightforwardly derived as 


and 


(X) = iV^(l-(l-p)^-) =7V^ 1^1- (^1-^) 
a{K) = ^NA(l-{l-pf^){l-pf^ 


\ 


IVa 1 - 1 - 


AT 

T-t 


Nb 


1 - 


AT 

T-t 


Na 


This yields the expectation value of the precursor coincidence rate 


») = 


(K) 


' Na 
and its standard deviation 

cr(rp) = (t{K)/Na 


= 1 - 1 - 


AT 

T-t 


Nb 


1 


\ Na 


1 - 1 - 


AT 

T-t 


Nb ^ 


AT 

■ 


( 12 ) 


(13) 


(14) 


(15) 


The p-value of an observation Ke with respect to the test distribution (Eq. Ill, i.e. 
the probability to obtain a number of coincidences K larger or equal to the empirically 
observed number K^, is then given by 


Na 

P{K>Ke)= Y. PiK*;NA,l-il-pfn. 


K*^Ke 


(16) 
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The statistics of trigger coincidence rates for event series generated by Poisson 
processes can be derived analogously by assuming a wide enough typical spacing of 
B-events: 


AT<s:T/Nb. (17) 

The distribution of the number of trigger coincidences K is then given by 


PiK-,NB,i-a-pr^) = 


iNb 



1 - 




Na' 


Nb-K 


(18) 


yielding the expectation value and standard deviation of the trigger coincidence rate 


(rt) = l- 1- 


AT 

T-t 


Na 


and 


(T{rt) = 


i 


Nb, 


1 - 1 - 


AT 

T-t 


Na'^ 


1 - 


AT 

T-t 


Na 


(19) 


( 20 ) 


respectively. As above, the p-value of an empirically observed number of trigger co¬ 
incidences Ke can then be written as 


Nb 

P{K>K,)= Y. 


P{K*-NbA-{1-p)^^). 


( 21 ) 


K*^K^ 


In the case that the conditions Q, ( |Io| ) and ( [l7| are not met, Monte Carlo sim¬ 
ulations need to be applied to compute statistics of event coincidence analysis such 
as the mean and standard deviation of coincidence rates or the significance level {p- 
value) of an observed coincidence rate corresponding to the null hypothesis that the 
empirical coincidence rate can be explained as the result of Poisson processes. To illus¬ 
trate this issue, we compare the expectation values and standard deviations of trigger 
coincidence rates for Poisson processes derived from analytics and Monte Carlo simu¬ 
lation for different relative coincidence intervals AT/T and numbers of B-events Nb 
(Fig. [2]). Indeed, the statistics are only comparable if conditions Q and ( |17^ are met, 
i.e. for Nb ^ 1 and Nb <C Ng{AT/T) = {AT/T)~^ (green line in FigTM, where 
Ng{-) denotes a critical value of Nb- Otherwise, Monte Carlo simulations ^ow that 
the analytical approximation tends to overestimate the expected coincidence rate and 
its standard deviation, even though the approximations (Eqs. 14 15) show the correct 
asymptotic behavior of r —>■ 0 and a(r) —>■ 0 for AT/T —>■ 0 and r 
for AT/T 1. 


1 and cr(r) —)• 0 


2.3.2 Stochastic point processes with prescribed inter-event time distribution 

Compared to the Poisson processes discussed above, a more general null hypothesis 
is that the observed values of coincidence rates can be explained by stochastic point 
processes with a given distribution of inter-event times P{At). For example, the inter¬ 
event time distribution for Poisson processes with average event rate A is given by 
the exponential distribution 

-\At 


Pi{At) = Xe 


( 22 ) 
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Fig. 2. Comparison of the expected trigger coincidence rate (r^) (A,B) and its standard 
deviation o(rp) (C,D) obtained using Monte Carlo simulation (A,C) and an analytical ap¬ 
proximation (B,D) depending on the relative coincidence interval AT/T and the number of 
-B-type events Nb- The analytical approximation is only accurate in the regime Nb 1 and 
Nb <S Nb{AT/T) = {AT/T)~^ (green line), where A’D(-) denotes a critical value of Nb- 
In this example, the number of A-type events is Na = 10, no lag is used (r = 0), events are 
distributed in the unit interval of width T — 1 and m = 1, 000 trials are used in the Monte 
Carlo simulations for each considered combination of parameters. 


However, it has been shown that many event time series display bursting behavior 
associated with inter-event time distributions having more slowly decaying (heavy) 
tails than the exponential distribution Pi{At) |9l84j . For example, human violent 
conflicts were reported to display universal bursting behavior [33] associated with 
inter-event time distributions of the form 

P2{At) = XFiXAt), (23) 

where F{-) exhibits a power-law decay with exponent a such that 

F{x) = aa;-“ (24) 


yielding 


P2{At) (X At 

Power-law inter-event time distributions with an exponential cutoff 

PsiAt) = C{XAtYe-^^^l^ 


(25) 


(26) 
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have been reported to accurately describe the return time statistics of earthquakes 
[55] and other types of event series. 

While deriving analytical results for coincidence statistics based on these and 
other classes of point processes with prescribed P{At) remains the subject of future 
research, Monte Carlo simulations can be applied to obtain test distributions for 
assessing the statistical significance of empirically observed coincidence rates. Note 
that these tests can only be meaningfully applied in practice if either P{At) can be 
estimated well from the empirically observed inter-event time statistics requiring a 
sufficiently large number of events, or a good process understanding exists, i.e. the 
inter-event time distribution is known from theoretical considerations or observations 
from analogous systems. Alternatively, ensembles of surrogate event series can be 
generated by randomly shuffling inter-event time intervals given a large number of 
events. These conditions are not met for the application studied in Sect.j^ implying 
the need for restricting the analysis to the null hypothesis of Poisson processes there. 


2.3.3 Surrogate event series generated from time series surrogates 

In a number of relevant applications of event coincidence analysis, e.g. when study¬ 
ing climatological extreme events, event series are generated from underlying time 
series data. This transformation from time series to event data is typically achieved 
by thresholding to identify extreme events in the time series according to a prescribed 
quantile or some other form of filtering. In this case, various types of time series 

surrogates m can be used to generate ensembles of event series for hypothesis testing 
by applying the same transformation to original and surrogate time series data. For 
example, univariate iterative amplitude adjusted Fourier transform (iAAFT) surro¬ 
gates as implemented in [36] can be used to generate surrogate event series based on 
surrogate time series with the same amplitude distribution and autocorrelation func¬ 
tion as the original data. This procedure is useful for constructing suitable significance 
tests for event coincidence analysis when extreme events in the series of interest tend 
to cluster due to pronounced autocorrelation in the underlying time series data, as 
it was found to be the case for European temperature, precipitation, tree ring width 
and simulated net primary productivity (NPP) |4]. Along these lines, bivariate event 
series surrogates derived from bivariate iAAFT time series surrogates could be used 
for testing the null hypothesis that observed coincidence rates can be explained by 
the co-occurrence of extremes due to the conserved linear cross-correlation structure 
of the underlying pair of time series. 

2.4 Related methods 

The complex systems-inspired framework of event coincidence analysis presented 
above is conceptually related to measures from spatial statistics for the correlation of 
spatial and spatio-temporal point processes [57155] such as Ripley’s cross-AT m as 
well as various forms of regression analysis for point process data. Another popular 
related approach is considering measures of event synchronization for quantifying the 
similarity of event series [1712012:-!] . Hence, the considerations on surrogate event se¬ 
ries and significance tests given above could be applied to the latter concept as well, 
given that the requirements and basic assumptions are met. However, it should be 
noted that event synchronization lacks the distinction between coincidence interval 
AT and lag parameter r provided by event coincidence analysis, and also does not 
allow to distinguish the cases of precursor and trigger coincidences. 

While the statistical theory of temporal point processes appears generally less con¬ 
solidated than the theory of standard time series |5] , a multitude of methodologies for 
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studying statistical interrelationships between event time series have been developed 
in the neurosciences in the last decades focussing on the specific, but important, ap¬ 
plication to neural spike trains. These techniques include methods focussing on the 
distributions of relative waiting times of events in series A with respect to events in 
series B [38) . cross-correlograms and cross-intensity functions as well as frequency- 
domain methods, neural spike train decoding or information-theoretical methods [S]. 
Certainly, this wealth of alternative methodologies holds a great potential for fruit¬ 
ful applications in other fields of science, considering, for example, event series of 
climatological extreme events and natural disasters. 

It should also be mentioned that the statistical and mathematical literature con¬ 
tains a large number of less closely related studies of coincidences, e.g. considering the 
birthday problem |39| . The term coincidence analysis is also used in different contexts 
in fields such as elementary particle physics [30] or in the identification of causal de¬ 
pendencies in configurational data |41j . This is why we choose to use the more specific 
term event coincidence analysis when referring to the methodology introduced in this 
paper. 


3 Application: extreme flood events as possible drivers of epidemics 

To illustrate the capabilities of event coincidence analysis, we apply it here to analyze 
the interrelations between two types of event time series of natural disasters, for which 
a causal relation is commonly assumed in the literature: hydrological flooding events 
(S-events) and outbreaks of epidemics (A-events) [32]. Our analysis is performed 
on the EmDAT data base covering the time interval 1950-2009 in a monthly time 
resolution |33|. This data base contains 3,468 flood events worldwide that are defined 
as a significant rise of water level in a stream, lake, reservoir or coastal region as well 
as 1,152 epidemic outbreaks, defined as either an unusual increase in the number of 
cases of an infectious disease that already exists in the region or population concerned, 
or the appearance of an infectious disease previously absent from a region. For each 
country k in the data base, a pair of event series is available containing Nf^k hood 
events and Ne,k epidemic outbreaks, respectively. 

As described above, event coincidence analysis allows for two different test setups: 
in the first setup, we test on the basis of the occurrence of epidemic outbreaks and 
perform a coincidence test with flood events preceding epidemic outbreaks within a 
given coincidence interval (statistics based on precursor coincidences). Since we an¬ 
alyze coincidences based on the condition that an epidemic outbreak has occurred, 
this setup may also be termed a risk enhancement test [^. In the second case, we 
perform the event coincidence analysis on the basis of occurrence of flood events that 
are followed by epidemic outbreaks (statistics based on trigger coincidences). We call 
this the trigger test |5| , since it investigates a possible causal direction of flood events 
triggering epidemic outbreaks. In the following, we do not consider additional time 
lags between different types of events that are not covered already by the coincidence 
interval AT and, hence, set r = 0 (see Fig. [^. Furthermore, we c omp ute aggre¬ 
gated coincidence rates covering all countries i n the data base (Sect. |2.2| ) as well as 
coincidence rates on a country-wise basis (Sect. [2T| ). 

To test for statistical significance with the null hypothesis (NH) that the observed 
coincidences can be explained on the basis of event series generated by Poisson pro¬ 
cesses with the empirically observed event rates, Monte Carlo simulation is applied 
to generate pairs of surrogate event time series with conserved event numbers on an 
individual country basis by uniformly and independently drawing event 

timings over the full analysis period 1950-2009. We generate m = 1, 000 ensemble 
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Fig. 3. Results of event coincidence analysis for the flood and epidemic outbreak event 
time series: Aggregated precursor (A) and trigger (B) coincidence rates. Dotted (dashed) 
grey lines mark the 95 % (99 %) significance level determined by Monte Carlo simulations. 
Coincidence rates that are signihcant at 95 % (99 %) levels are highlighted by bold markers. 


members for each country and significance levels of 95 % and 99 % are applied for the 
rejection of the NH. 

Figure displays coincidence rates aggregated over all countries with available 
event data for coincidence intervals AT ranging from 0 to 24 months. While AT = 0 
implies considering coincidences within the same month, a 24-months window counts 
coincidences between 0 and 24 months after (before) a Hooding (epidemic outbreak) 
event, respectively. For the risk enhancement test, we find that about 20% of all 
epidemic outbreaks have been preceded by a flooding event within a month before 
the outbreak. Our corresponding results indicate that floods robustly contribute to 
the outbreak risk (Fig.[^). While no direct causal attribution is possible based on this 
test, the trigger test (Fig. |^) also robustly suggests a possible causal relationship. 
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We find that about 7% of all flooding events have been followed by an epidemic 
outbreak in the next month, which is significant at the 99 % level. 

These results are robust also for other small coincidence intervals. In turn, for 
window widths AT exceeding 3 months, we do not find indications that the NH 
of the aggregated coincidence rate arising by chance can be rejected. However, this 
changes for window lengths between 9 and 13 months, where the NH can be rejected 
at least at the 95% level, indicating a robust long-range interrelation between the 
two types of events. In fact, more than 50 % of all epidemic outbreaks have been 
preceded by a flooding event over a 12-months coincidence interval and about 20 % of 
all flooding events have possibly triggered epidemic outbreaks within the 12 months 
following the natural disaster. 

It should be noted that although we perform multiple hypothesis tests for varying 
coincidence intervals AT, standard corrections of the significance level to account for 
these multiple comparisons such as Bonferroni adjustments are not applicable in our 
case. This is particularly true for the corresponding universal null hypothesis that no 
statistical relationship exists between flood events and epidemic outbreaks for any 
of the AT [44145) . In contrast, the two detected clusters of AT with statistically 
significant rates for both precursor and trigger coincidences around monthly and 
annual time scales indicate the existence of robust coincidence relationships that 
are present in the data. To further investigate the robustness of these findings, we 
performed Monte Carlo simulations to assess the probability that the null hypothesis 
is falsely rejected for fixed pairs of Poisson data surrogates for n values of AT. We 
find that the probability to observe n = 4 falsely rejected tests at a significance level 
of 99% (compare Fig. in this setting is less than 0.001, implying that the results 
presented in Fig. can be considered highly statistically significant when taking the 
effects of multiple testing in the specific setting of our study into account. 

The global frequencies of flood events and epidemics are depicted in Fig.|^,B, and 
the country-wise trigger coincidence rates for coincidences within the same month and 
a 12-months coincidence interval in Fig. EP ,D. While these maps give some guidance 
on where floods may have triggered epidemic outbreaks, they need to be interpreted 
with great caution, since no information about the statistical significance of these rates 
is conveyed. Since coincidence rates are plotted, countries with very limited statistics 
(e.g. only one or very few events) can still exhibit high individual coincidence rates. 
However, several epidemic-prone regions such as parts of South America, South-East 
Asia, India and Sub-Saharan Africa are highlighted as having substantial trigger 
coincidence rates for both coincidence intervals. At the same time, these maps also 
illustrate a limitation of the tests performed here, since the data is provided on a 
country-wise resolution, while the considered events, in particularly floods, are bound 
to geographical regions and water-sheds. This is in particular problematic for larger 
countries, where a sub-country resolution would be needed to ensure at least the 
possibility of a causal relation between the event time series. While this represents 
a clear limitation, it does not affect the significance of our results. The reason is 
that inclusion of causally unrelated events on a country basis can only increase the 
probability of coincidences occurring by chance, thereby increasing the significance 
levels and rendering the test more conservative. 

While being the most common natural disasters, floods are the leading cause of 
natural disaster fatalities worldwide: Doocy et al. [46] estimate global fatalities due 
to flood events directly to exceed half a million for the period 1980-2009. At the 
same time, flood events are also found to increase the risk of outbreaks of fecal- 
oral, vector-borne and rodent-borne diseases |47j . However, the interrelation between 
floods and disease outbreaks is found to be complex and strongly case-dependent 
m and, as a consequence, difficult to assess in an aggregated fashion using classical 
statistical methods. The event-based event coincidence analysis applied here provides 
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Fig. 4. Global mapping of the frequency of floods (A) and epidemics (B) between 1950 and 
2009. Country-wise trigger coincidence rates are shown for coincidences occurring within the 
same month (C) as well as a coincidence interval of 12 months (D). Gray Hllings indicate a 
lack of data for the corresponding countries. 


a methodological alternative by assessing the statistical interrelationships between 
the two types of event time series on a case-to-case basis. In line with a systematic 
review of the literature on floods and human health [48] , we report robust evidence for 
both short-term and long-term impacts of floods on epidemic outbreaks. Specifically, 
we find that more than 50% (20%) of all epidemic outbreaks have been preceded 
by a flooding event within a 12(l)-month(s) window before the outbreak and that 
about 20% (7%) of all floods might have triggered such an outbreak in the 12(1) 
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month(s) following the natural disaster. Our results indicate statistically significant 
coincidence rates up to three months following the disaster and then between 9 and 
12 months afterwards, indicating the importance of seasonal effects, which shall be 
further studied in future work. In particular in tropical regions, flooding events are tied 
to the rainy season as are major drivers in particular vector-borne diseases |49j . Thus, 
while the significant short-term coincidence rates might to a large extent be a direct 
consequence of the flooding events, indirect effects will likely dominate the long-term 
coincidence rates observed, e.g. through impacts on general health, food systems and 
livelihoods exacerbating poverty and potentially malnutrition that increase long-term 
susceptibility for diseases |15]. It is important to note, however, that the clustering 
of floods and epidemics during the rainy season in tropical countries could lead to 
statistically significant long-term coincidence rates due to the counting of causally 
unrelated events in successive rainy seasons. This effect should be controlled for in 
future studies. 

Given the projected increase in flood risk under anthropogenic climate change 
[SOU], our findings highlight the risk of such natural disasters for human health and 
call for an integrated view on climate and health risks in adaptation efforts. As a 
note of caution, we would like to stress again the illustrative nature of the results 
of event coincidence analysis presented for this particular application. More detailed 
analyses including additional and independent data bases and taking into account 
systematic effects such as biases induced by changes in self-reporting behavior as 
have been reported for EmDAT and other data bases [S2] are relevant subjects of 
future research. 


4 Conclusions 

In this work, we have introdnced event coincidence analysis as a method for inves¬ 
tigating statistical interrelationships between event time series such as climate ex¬ 
tremes, natural disasters or civil conflicts and other sources of event-like data. Event 
coincidence analysis builds upon already established methodologies such as event 
synchronization or measures of correlation between spatial point processes and allows 
to quantify the strength (via the coincidence rate), directionality (by distinguishing 
precursor and trigger coincidences) and lag of such interrelationships. Statistical sig¬ 
nificance tests for these properties have been proposed based on different kinds of null 
hypotheses on the nature of the temporal point processes underlying the event series, 
including Poisson processes and stochastic point processes with a given inter-event 
time distribution. 

As an exemplary application in the timely context of global anthropogenic cli¬ 
mate change, we have employed event coincidence analysis for studying statistical 
interrelationships between flood events and epidemic outbreaks in the same country 
on a globally aggregated level. We have found evidence that flood events may have 
acted as possible drivers of epidemic outbreaks in the past, underlining this potential 
causal relationship as an important subject of further studies in climate impact and 
adaptation research. 

Promising further methodological developments include the design and more de¬ 
tailed mathematical analysis of appropriate null hypotheses for event coincidence 
analysis including analytical derivations of the corresponding test statistics as well 
as the incorporation of event amplitude information |5dj . i.e. by considering marked 
point processes. Spatial information could be taken into account more explicitly than 
is the case for the aggregated coincidence rates studied in this paper, building upon 
a notion of spatio-temporal coincidences with links to the theory of spatial m and 
spatio-temporal point processes |28j . Eurthermore, multivariate extensions such as 
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partial or conditional event coincidence analysis [54] for measuring statistical inter¬ 
relations between two event series conditional on a third or even more event series, 
e.g. methods extending upon the PC-algorithm and its variants [SS], would allow to 
extract further information from rich sources of event data in hypothesis-driven as 
well as exploratory research modes [5]. 


This research was performed in the context of flagship project COPAN on Coevolutionary 
Pathways in the Earth system and the BMBF Young Investigators Group “Complex Systems 
Approaches to Understanding Causes and Consequences of Past, Present and Future Climate 
Change” at the Potsdam Institute for Climate Impact Research. We appreciate funding by a 
Humboldt University / IRI THESys fellowship, the Stordalen Foundation (via the Planetary 
Boundary Research Network PB.net), the Earth League’s EarthDoc program, the German 
Federal Ministry for Education and Research (BMBF projects GLUES and CoSy-CC^ (grant 
no. 01LN1306A)) and the Evangelisches Studienwerk Villigst. The work was supported by the 
German Federal Ministry for the Environment, Nature Conservation and Nuclear Safety (11- 
II-093-Global-A SIDS and LDCs). Jobst Heitzig, Marc Wiedermann and Miguel Mahecha are 
acknowledged for helpful insights and discussions at various stages of the reported research. 
Event coincidence analyses can be performed using the R package CoinCalc |26| which is 
available at https://github.eom/JonatanSiegmund/CoinCalc. 
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